## ATCA for Digital Signal Processing

**Rob Persons** 

**Sr. Field Applications Engineer** 



## Agenda

- Company Intro
- Brief Introduction to Advanced Telecom Computing Architecture (ATCA)
- Basic Digital Signal Processing Concepts
- New ATCA Technologies to Address DSP Applications
  - High Performance Multi-Core Processors
  - Updated Vector Processing Units in Cores
  - High Speed Fabrics in Backplane
  - Advanced Flow Control Software on Switches and Blades
  - Repurposing Packet Processing Software to target DSP Applications



## Emerson At-A-Glance 2012

#### US \$24.4 Billion in Sales



Headquarters in St. Louis, Missouri USA NYSE: EMR

Diversified global manufacturer and technology provider

Approximately 133,000 employees worldwide

- Manufacturing and/or sales presence in more than 150 countries
- Over 200 manufacturing locations around the world
- No. 120 on 2012 FORTUNE 500 list of America's largest corporations
- Founded in 1890



## What is AdvancedTCA?

- An open standard (COTS) developed 10+ years ago deployed in all major telecom networks
- An ideal basis for a common platform, on which many applications can be built
- The standard covers shelves, boards, mezzanines, and management
- Systems are 19" wide and are designed to fit in 600mm deep racks
- Current ATCA Chassis can support 350W+ per slot, but can be limited to 200W
- High speed 10G and 40G internal data fabrics now deploying
- Blades are 8U (14") high and have no fans

## A dvanced TCA®







#### Benefits to the Use of ATCA for DSP Processing

- Multi-core Xeon Processors Well Suited to Process Complex
  Data
- ATCA Server Blades Efficiently Supply Many Cores and a Great Deal of Memory to Solve Problems
- ATCA is Inherently Rugged
  - Applications that are Shipboard, Manned Airborne, and Transit Case Applications use it today
- Open Standard with Many Vendors
- Other Bussed Architectures add Cost to Support Added Ruggedization when it is not Needed in Many Benign Environment Applications
- Aggressive Roadmap of Products Targeting Algorithm Processing Blades (Tic-Toc)



### **Basic DSP Concepts**



- Sensors Detect Targets
- High Speed Interface Transfers to Rack with Computing Equipment
- Analog Data Tracking Data is Either
  - Converted to Digital at the sensor
  - Converted to Digital at the DSP Processing Unit



- Traditionally DSP Systems have been VME
- Trend Toward OpenVPX
  - High Speed Serial Replaces Parallel Bus
  - High Speed Serial can be PCI-E or Serial Rapid I/O
- Multi-Processor Board that Supports High Level DSP Libraries
- Host Processor to Manage
  Data Flow
- Range of Ruggedization Levels Required based on Application



## High Performance Processing Core

- Intel <sup>®</sup> E5-E2600 "Sandy Bridge"
  - 1.8Ghz/core
  - 8 Multi-threaded Cores
  - 32nm
- 20MB L3 Cache, 2.5MB per Core
- Four Integrated Memory Controllers
- Dual QuickPath Interconnect between both CPUs
- PCle Gen3, 40 Lanes Per Socket
- Socket Ready for 10 Core "Ivy Bridge" (22nm)



## **Packet Processing Blades with 40G**

- Gen3 PCIe from Processor Supports 40G Ethernet Controllers
- Intel Supplies Alternative Coprocessor SKUs for Data Plane HW Offloading
  - HW
    Encryption/decryption
  - 40G Offload Support for CPU
- 40G Direct Connection
  between ATCA Fabric and
  Processor



#### ATCA Dual "Sandy Bridge" Packet Processing Blade

- Cave Creek Acceleration Modules Offload 40G Traffic to Processors
- 40G Fabric Interfaces Efficiently Transfer Data to the Processor Cores
- Flow Control Software Running on the Boards Manages IP Dataflow to and from the Cores
- Interact with Specialized Packet Processing Version of OS



## Intel®'s Advanced Vector Extensions (AVX)

- Introduced In Sandy Bridge Family of Processors
- Extends 128Bit SIMD Instructions of SSE to 256Bits
  - This potentially doubles floating point operation performance when using single precision floating point numbers
- Each Core supports AVX Instructions
- Specific Instructions that Support Signal Processing Applications
- Intel<sup>®</sup> Supplies Optimized Libraries for AVX
  - Integrated Performance Primitives (IPP)
- Optimized VSIPL Libraries are also Available from 3<sup>rd</sup> Parties
- Haswell Processors will Support AVX2 which
  - Adds specific functions to fetch non-contiguous data from memory
  - Promotes AVX 128Bit SIMD to 256Bits
  - Vector shift instruction with variable shift count



#### (4x)10GBASE-KR Fabric Configuration (PICMG 3.1R2 "Option 3-KR")



11

**Network Power** 

## 40GBASE-KR4 Fabric Configuration

#### (PICMG 3.1R2 "Option 9-KR")



**Network Power** 

## Flow Control on ATCA Switches



# Intel<sup>®</sup>'s Data Plane Packet Processing Software (DPDK<sup>®</sup>)

- Data Plane Development Kit (DPDK<sup>®)</sup>
- Introduced in Nehalem Class Xeon Processors
- Software Package to Optimize X86 Cores to Analyze IP Packet Data
- Optimized Data Plane Libraries and Optimized NIC Drivers in User Space
  - Under special version of Linux which separates high level control from algorithms running as threads on specific dedicated processor cores
  - Queue and Buffer Management, Packet Flow Classification and Poll Mode NIC Drivers
  - Low Overhead run-to-completion model optimized for fastest possible algorithm performance
- Additional DPDK<sup>®</sup> Libraries and Drivers
  - Memory Manager (Huge page tables to optimize performance)
  - Buffer Manager (Optimized memory allocation tool that eliminates need to lock)
  - Queue Manager (Manage incoming and outgoing data to the cores)
  - Flow Classification (IP flow management, optimized around Ethernet controller)
  - Poll Mode Drivers (User mode drivers eliminating interrupts for threads running algorithms)



#### ATCA Dual "Sandy Bridge" Packet Processing Blade



## Let's Put it All Together



